An efficient and scalable plagiarism checking system using Bloom filters

نویسندگان

  • Shahabeddin Geravand
  • Mahmood Ahmadi
چکیده

With the easy access to the huge volume of articles available on the Internet, plagiarism is getting worse and worse. Most recent approaches proposed to address this problem usually focus on achieving better accuracy of similarity detection process. However, there are some real applications where plagiarized contents should be detected without revealing any information. Moreover, in such web-based applications, running time, memory consumption, communication and computational complexity should be also taken into account. In this paper, we propose a similar document detection system based on matrix Bloom filter, a new extension of standard Bloom filter. The experimental on a real dataset show that the system can achieve 98% of accuracy. We also compare our approach with a method recently proposed for the same purpose. The results of the comparison show that the Bloom filter-based approach achieves much better performance than other in terms of the aforementioned factors. 2014 Elsevier Ltd. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cuckoo Filter Modification Inspired by Bloom Filter

Probabilistic data structures are so popular in membership queries, network applications, and so on. Bloom Filter and Cuckoo Filter are two popular space efficient models that incorporate in set membership checking part of many important protocols. They are compact representation of data that use hash functions to randomize a set of items. Being able to store more elements while keeping a reaso...

متن کامل

Bloom Filters in Probabilistic Verification

Probabilistic techniques for verification of finite-state transition systems offer huge memory savings over deterministic techniques. The two leading probabilistic schemes are hash compaction and the bitstate method, which stores states in a Bloom filter. Bloom filters have been criticized for being slow, inaccurate, and memory-inefficient, but in this paper, we show how to obtain Bloom filters...

متن کامل

Bloom Filters & Their Applications

A Bloom Filter (BF) is a data structure suitable for performing set membership queries very efficiently. A Standard Bloom Filter representing a set of n elements is generated by an array of m bits and uses k independent hash functions. Bloom Filters have some attractive properties including low storage requirement, fast membership checking and no false negatives. False positives are possible bu...

متن کامل

Reducing False Positives of a Bloom Filter using Cross-Checking Bloom Filters

A Bloom filter is a compact data structure that supports membership queries on a set, allowing false positives. The simplicity and the excellent performance of a Bloom filter make it a standard data structure of great use in many network applications. In reducing the false positive rate of a Bloom filter, it is well known that the size of a Bloom filter and accordingly the number of hash indice...

متن کامل

Scalable Bloom Filters

Bloom Filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumpti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computers & Electrical Engineering

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2014